Reading � Curious machines

Greg Detre

Tuesday, April 08, 2003

Killeen, �Modeling games from the 20th century�, in Behavioural processes

is Behavioural Processes a reputable journal???

are there many other behaviourist publications??? what�s its mandate???

From palm

This was a curious paper, not least of all because I still don't know what it's about.

Complementarity theory - "complementarity occurs whenever some quantity is oconserved'. This might be knowledge (as in the case of position and momentum at the quantum???} level, field of view vs magnification???,� mental resources, physical resources, precision vs clarity/amount of data needed/expedience...

In the case of our processing abilities, we need to be aware of complementarity to avoid summarily dismissing theories because they are too complex, or too superficial.

"Scientists have yet to develop a set of techniques for chagning the field fo view fo a theory while guaranteeing connectedness through the process: theroetical depth of focus is discrete, not continuous." Is there any hope that such domain-general techniques could ever exist??? Is this not simply another way of restating the epistemological problem of (weak) emergence - that is, we cannot keep track of many interacting components (even if the rules of their interactions are simple, and especially if they are weak). Yes, this is what he says when he writes "non-linear interactions, however, give rise to 'emergent phenomenal'. I like the idea that all models should ideally demonstrate that they preserve phenomena one level up and one level down. Can we clarify this notion of level???

I need to look up conditioning, reinforcement, operant vs classical, multiple schedules and Garcia??? see pg 34.

Complementarity of explanation??? pg 35

I like the idea of many types of explanation. Is there any problem/misreading in his paraphrase of 'cause' with 'explanation'???

Write to a Greek scholar re this.

Aristotle's four -be-causes:

- efficient - occur before an event and/or(???) trigger it - sufficient causes - , or their absence prevents it - necessary causes -. Usual sense of 'cause'.

Skinner's variables of which behaviour is a function???

- material - substrates, underlying mechanisms. "Assertions taht they are the best or only kind of explanation is reductionism".

- final - the reason an entity/process exists, what it does that has jsutified its existence.

Skinner's selection by consequences???

"Assertion that final causes are time-reversed efficient causes is teleology: results cannot bring about their efficient causes.

When you ask what a strange machine does, you are seeking a final cause. Try and fit with Marr's three levels??? Final causes can be proximal or distant, e.g. Evolutionary pressures vs a history of reinforcement or intentions.

- formal causes - analogues (sp???), metaphors and models, the structures with which we represent phenomena and which permit us to predict and control them. e.g. The syllogism, or the differnetial equation, molecular model.

Skinnerian three-term contingency???

"all understanding involves finding an apprpriate formal cause - that is, mapping phenomena to explanations having a similar structure to the thing explained." Hmmm. How is this different from the material cause??? This may be what we mean by internalisation, or the process by which we learn, but 'understanding' implies both these - very similar - senses, but also the process by which we maintain a conceptual representation. After all, when we understand a concept, we may want to then use it later as an analogue for some even newer concept - we don't necessarily do this each time by recursive reference to the previous underlying concept or analogue. If nothing else, this would lead to an infinite regress...

Interesting: causal, reductive, functional and formal explanations...

Could the difference between material and formal causes lie in the formality of their respective representations??? Or is it that the material is to do with actual physical types of substrates/materials, whereas the formal is to do with the actual algorithms perform, while the final is to do with why it does those algorithms or what role the inputs and outputs play in some larger system, i.e. In terms of why and purpose rather than how.

"A formal explanation proceeds by apprehending the event to be explained and placing it in correspondence with a model". Model identifies necessary or sufficient antecedents for the event. If those are found in the empirical realm, the phenomenon is said to be explained. Confounds must also be eliminated.

Find out more about Behavioural Processes??? Are there really still behaviourists still around??? Is there any way to restate their position to make it plausible, less strong??? Could it be that it's still worth talking occasionally in these behaviourist terms if we're just thinking about one agent/component among many which happens to use some behaviourist learning paradigms.

Analog vs analogy - etymology???

Distinction between post- and pre-diction.

Control: the user of a model introduces a variable known to bring about a certain effect.

Explanation is "thus the provision of a model with -a- the events to be explained as a consequent, and -b- events noticeable in the environment as antecedents". Isn't this just material/efficient explanation though???

Control is "the arrangement of antecedents in teh context of a model that increases the probability of desired consequences"

Truth is "a state of correspondence between models and data". But you need to specify both the model and the data it attempts to map. Hmmm??? "Finding ways to make models applicable to apparently diverse phenomena is part of the creative action fo science." You not only have to map the variables to their empirical instantiations, but also the operators.

"Life is sacred, except in war; war is bad, except when fought for justice; justice is good, except when untempered by humanity".

Truth is binary, but precision is graded

jejune???

False models may apparently be more useful than true ones, e.g. Newtonian mechanics. Hmm. "It is trivial to show a modle false; restricting the domain of the model or modifying the form of the model to make it truer is the real accomplishment."

modeling tools are formal structures, whereas models are such tools applid to a data domain.

"Behavioural science is a search in the empirical domain for the variables of which behaviour is a function; and a search in the theoretical domain for the functions according to which behaviour varies (= models)"???

Don't like his definition of categorisation - although it's not obvious that it is a definition of all categorisation. See pg 38.

Discussion of reinforcement and the law of effect???

Skinner: the reflex must be viewed in set-theoretic terms. Reinforcement acts to strengthen movements of a similar kind.

"Operant responses are those movements whose occurrence is correlated with prior (discriminative) stimuli and subsequent (reinforcing) stimuli."

reinforce a reminder using language - interesting... Pg 39

sets vs family resemblances - can behaviourism deal with these???

Operant???

Blocking???

Figure 2??? Don't you need to represent these learning paradigms with before and after shots???

All probabilities are conditional on a universe of discourse - our goal is define the relevant universe - context - the smae way as the animal does, so that our model predicts its behaviour.

Ford effect in politics - pg 42

Are the formulations he gives of Bayes theorem the same as the normal ones??? Don't they usually involve intersection???

Considers the value of a Bayesian inference, given the probability of the antecedents...??? Pg 43

"What is the probability that the model in question accounts for more of the variance in the empirical data than does some other model of equal complexity?"

Am I being stupid about the axes on pg 44???

Shepard's non-metric multidimensional scaling???

Change is fundamental, and calculus is the language of change

"Behavoiur is change in stance over time. Skinner: behaviour is "the movement of an organism within a frame of reference"

how interesting is this to me as a non-behaviourist??? what's neo-behaviourism??? Is he a neo-behaviourist??? Is that a purely philosophical position on the mind-body problem???

"Finding the right balance in life is finding the point at which life-satisfaction has zero-derivatives", i.e. the factors which affect life-satisfaction cannot be increasd without affecting another negatively, right??? Pg 45

elegant generalisations in calculus: not to find a point on a function that minimises some value, but rather to find a -function- that minimises some value, e.g. The brachistochrone problem - what is the shape of a surface will cause a ball rolling down the surface to arrive at the bottom in a minimum time? Posed by Bernoulli, five solutions, including Newton's, himself, his brother, one of their students and Leibniz

Decision theory "is a way of combining measures of stimuli and reinforcers to predict which response will be most strongly reinforced"

is this what learning in real brains is actually about though??? pg 47

fig 6: "the multi-dimensional signal-detection/optimisation problem faced by real organisms: how to access a particular reinforcer"

closed loop environments - when actions change the environment, which in turn changes future actions

open vs closed loop??? open is where your actions don't affect the world feeding back into your sensorium, right??? I think so.

If signalling is possible, players will seek signs of character - that is, predictors of future behaviour - signal detection becomes a survival skill.

"Organisms such as rats and humans may be viewed as finite-state automata, differing primarily in the amount of memory that is available to them. This statement does not mean that they are nothing but automata." When he says that this doesn't mean they're nothing but automata, what else??? Perhaps he's saying that they could also be modelled as more powerful types of automata... No, I don't think so.

"More memory means more capacity to retain and relate conditional probabilities. Enhanced ability to conditionalise permits nuanced reactions." No no no.

"The mind of science may be claimed by philosophy, but its heart belongs to tinkerers and problem-solvers."

"The difference between scientists and anagram fans is the idea that scientific problems are part of a larger puzzle set; that one puzzle solved may make other pieces fall into place. Bu tthen jigsaw puzzles have that feature too." Hmmm. But they don't make other jigsaw puzzles fall into place. Hmmm, well, science is compartmentalised to some degree too. That's what abstraction is all about. I don't really understand his point about the jigsaw puzzles, other than maybe as a self-deprecation about the appositeness of his own analogy.

"The goal of science is not perfect modles, because the only perfect renditions are teh phenomena sui generis; the goal is better models". Hmm, isn't this begging the question, i.e. That there is no such thing as a perfect model???

"The more laconic amodel, the more likely we can extrapolate its prediction to new situations without substantial tinkering. The most succinct models are called elegant."

Argh. He's got the apostrophe in 'it's' wrong. Pg 50

Discarded

Questions

Blumberg, �Integrated learning for interactive synthetic characters�

�dogs are only able to learn causality if the events, actions and consequences are proximate in space and time, and as long as the consequences are motivationally significant�

animals and their trainers act as a coupled system to guide the animal�s exploration of its state, action and state-action spaces

take advantage of preditable regularities

maximal use of any supervisory signals (implicit or explict)

make them easy to train by humans

the synthetic dog mimics some of a real dog�s ability to learn including:

the best action to perform in a given context

what form of a given action is most reliable in producing reward

the relative reliability of its actions in producing a reward and altering its choice of action accordingly

to recognise new and valuable contexts such as acoustic patterns

to synthesise new actions by being �lured� into novel configurations or trajectories by the trainer

the behavioural architecture is one in which learning can occur, rather than an architecture that solely performs learning???

�integrated appraoch to state, action and state-action space discovery within the context of reinforcement learning and an articulation of heuristics and design principles that make learning practical for synthetic characters�

most approaches to generating motor primitives focus on learning �how to move� subject to some criteria such as energy minimisation

they focus on learning the �value with respect to a motivational goal of moving in a certain way�

state = �a specific, hopefully useful configuration fothe world as sensed by the creature�s entire sensory system. As such, state can be thought of as a label that is assigned to a sensed configuration. The space of all represented configurations of the world is the state space�

action = �how a creature can affect the state of its world� � finite set of actions, one at a a time, action space is the set of all possible actions

state/action pair = <S/A>, relationship between a state S and an action A. �typically accompanied by some numerical value, e.g. future expected reward, that indicates how much benefit there is in taking the action A when the creature senses state S. Based on this relationship a policy is built, which represents a probability with which the creature selects an action given a specific state�

credit assignment = �the process of updating the associated value of a state-action pair to reflect its apparent utility for ultimately receiving award�

animals are biased to learn proximate causality

Leyahusen suggests that the individual actions may be largely self-reinforcing, rather than being inforced via back-propagation

does Q-learning work in a dynamic environment???

clicker training with real animals???

clicker training:

1. create an association between the sound of a toy clicker and a food reward � then use the click sound to �mark� behaviours that they wish to encourage

animals assume that an action/stimulus immediately preceding a motivationally significant consequence is �as good as causal�

easy to provide immediate feedback � bridges the dog earning and receiving the reward

2. in order to get the dog to first produce the desired behaviour so that it can be rewarded, the trainer encourages the dog to perform specific behaviours, e.g. training the dog to touch an object such as the trainer�s hand or a �target stick�, luring the dog through a trajectory or into a pose

the animal can learn to associate reward with its resulting body configuration/trajectory, and not just the action of following its nose

shaping = when the trainer guides the dog towards the desired behaviour by rewarding ever-closer approximations

3. add a discriminative stimulus (e.g. gesture or vocal cue), usually trained by being issued just after the animal has started to perform the action

teaching the action first and then the cue is unlike other training techniques

temporal window around an action�s onset � through variations in how the action is performed and by attending to correlations between the action�s reliability in producing reward and the state of contemporaneous stimuli, they are performing local search in a potentially valuable neighbourhood

the state and action spaces often containa� natural hierarchical organisation that facilitates the search process

need to be able to train with just observable behaviour, without looking at internal state

animals seem to build models of important sensory cues �on demand�, using rewarded actions as the context for identifying important sensory cues and for guiding the perceptual model of the cue

discover, based on experience, those patterns or motions that seem to matter and add them dynamically to their respective spaces � state space discovery and action space discovery

use the context of a rewarded action to facilitate the classification process

during luring, animals delegate credit from the �follow your nose� state-action pair to another pair

hierarchical representation of state space � then we can �notice� that a given action is more reliable when a whole �class� of states is active � further exploration/refinement within that class of states might be fruitful

state space is represented by a percept tree

percepts are atomic perception units, with arbitrarily complex logic, whose job it is to recognise and extract features from the raw sensory data

if a percept is activated, the sensory data is passed recursively to the percept�s children for more specific classification

how are new percepts generated???

percepts similar to codelets???

hierarchically organised though

fuck � have they already done what I wanted to do???

are there action equivalents of codelets???

cepstral coefficients???

motion percepts use a model that represents a path through the space of possible motions � eh???

in RL terminology, a percept refers to a subset of the entire current space

percept decomposition of state allows for a heuristic search through potentially intractable state and state-action spaces

but it makes learning conjunctions of features harder � why???

because they fit into more than one place in the hierarchy???

actions = identifiable patterns of motion through time

treating actions as verbs allows you to treat the action as a label

but isn�t good for the type of action space discovery needed for luring

instead, use a pose space that contains all of its possibly body configurations, and an action is a path through this space

nodes in the pose graph are connected together in tangled, directed, weighted graphs

can calculate a distance metric between two paths that measures their similarity

the representation of a particular state-action pair is an action tuple:

what to do

when

to what

for how long

why

like an augmented state-action pair in which the state information is provided by an associated percept (when), the action (what) is the label for a given path through pose space

action tuples are organised into groups and compete probabilistically for action based on value and applicability � eh???

use �action tuple� and �percept-action pair� interchangeably

the idea of specificity of action being a bonus is interesting. Minsky consdieers it as a criterion between Critics�

how are the more specific children of states created???

using a classifier, that uses the reward context to help with classification

how does it decide when to create a new state though???

cf parallel terraced scan??? basically, a focused search, right???

well, y, but it�s clever the way they�ve found in advance the potentially valuable areas

state-action space discovery

the system is initially populated with only a few percept-action pairs (i.e. action tuples) that represent general world states (i.e. reference percepts at the top of the percept tree)

specialisation = over time, new percept-action pairs are added as the system gathers evidence that a promising action associated with a given state might be made even more reliable if associated with a specific child of the state

in order for specialisation to occur (during the credit assignment phase)

the value of the percept-action pair has to be above a threshold, i.e. evidence that it would be valuable

it must have a child whose reliability/novelty is above a threshold

an unsupervised technique such as k-means clustering can be employed to partition the observed patterns into distinct clusters or classes � each cluster/class represents a region of the state space

instead, they treat all patterns that occur contemporaneously with an action that directly leads to a reward as belonging to the same cluster

are they able to build training scripts, so that new dog-brains can be brought up to some base level of expertise and evaluated with some standard tests???

Discarded

Questions

SVM??? support vector machines

how do they work???